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We generalize recent work of Massar and Popescu deal- 
ing with the amount of classical data that is produced by a 
quantum measurement on a quantum state ensemble. In the 
previous work it was shown that quantum measurements gen- 
erally contain spurious randomness in the outcomes and that 
this spurious randomness can be eliminated by carrying out 
collective measurements on many independent copies of the 
system. In particular it was shown that, without decreasing 
the amount of knowledge the measurement provides about the 
quantum state, one can always reduce the amount of data pro- 
duced by the measurement to the entropy H(p) = — Trplog p 
of the ensemble. 

Here we extend this result by giving a more refined de- 
scription of what constitute equivalent measurements (that is 
measurements which provide the same knowledge about the 
quantum state) and also by considering incomplete measure- 
ments. In particular we show that one can always associate 
to a POVM with elements aj, an equivalent POVM acting 
on many independent copies of the system which produces an 
amount of data asymptotically equal to the entropy defect of 
an ensemble canonically associated to the ensemble average 
state p and the initial measurement (aj). In the case where 
the measurement is not maximally refined this amount of data 
is strictly less than the amount H(p) obtained in the previous 
work. This result is obtained by a novel technique to analyze 
random selections. We also show that this is the best achiev- 
able, i.e. it is impossible to devise a measurement equivalent 
to the initial measurement (aj) that produces less data. 

We discuss the interpretation of these results. In particular 
we show how they can be used to provide a precise and model 
independent measure of the amount of knowledge that is ob- 
tained about a quantum state by a quantum measurement. 
We also discuss in detail the relation between our results and 
Holevo's bound, at the same time providing a new proof of 
this fundamental inequality. 



I. INTRODUCTION 

An essential aspect of quantum mechanics is the mea- 
surement process. Only by measuring can a macro- 
scopic observer obtain knowledge about a quantum sys- 
tem. However the knowledge that is obtained about the 
state of a quantum system is in general not complete 
since from the outcome of a measurement it is in general 
not possible to infer the initial state. Furthermore there 
are many different measurements that could be carried 
out on the system and these measurements are in general 
mutually incompatible. 

It is therefore natural to try to make measurements as 
efficient as possible. The simplest way one can make a 
measurement efficient is to devise it in such a way that it 
provides as much knowledge as possible about the state 
of the system]^]. This first approach has been extensively 
studied, see for instance We note that in some 

cases it can be interesting to make an incomplete mea- 
surement (which does not provide maximum knowledge 
about the system). The incomplete measurement can 
then be refined at a later stage by carrying out a second 
measurement on the system. 

The second way one can make a measurement efficient 
is to reduce the amount of classical data it produces. 
Indeed if a measurement produces outcome j with prob- 
ability pj , the amount of classical data produced by the 
measurement is I = — £3 . Pj ^°EPj ( m this paper log and 
exp are always to base 2). This second aspect of opti- 
mizing measurements was first considered in 

Minimizing / is interesting for two reasons. First it 
makes the measurement less wasteful of resources since it 
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1 In this article we shall distinguish between the words 
"knowledge" and "information". Thus we shall say that a 
measurement provides knowledge about the state of a quan- 
tum system, rather than information. We introduce this dis- 
tinction because the second term is often associated with the 
"mutual information" between the initial state and the re- 
sult of the measurement. And, as examples show, an efficient 
measurement is not necessarily one that maximizes the mu- 
tual information between the initial state and the result of 
the measurement. 
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minimizes that amount of classical data that is produced. 
Indeed the increase in entropy — in the thermodynamic 
sense — due to the irreversibility of the measurement 
process will be minimized if the amount of data I pro- 
duced by the measurement is minimized. Secondly, as 
argued in ||, the minimum value of / provides a model 
independent answer to the question how much knowledge 
about a quantum system is obtained by a measurement? 

The main result of |3| was to show that it always pos- 
sible to reduce / so that it is less or equal than the von 
Neumann entropy of the ensemble of quantum states on 
which the measurement is carried out. Thus the answer 
to the above question is that a quantum measurement 
can provide at most one bit of classical knowledge about 
an unknown qubit. 

However minimizing / is not an easy task. It must 
be carried out at the level of the measurement itself and 
cannot be realized as a post-processing of the data pro- 
duced by the measurement. This is because there are 
positive operator valued measures (POVM) that provide 
maximum knowledge about the state and that have a 
number of outcomes that is larger than the von Neu- 
mann entropy of the ensemble. Such measurements add 
spurious randomness to their outcomes. To address this 
difficulty and remove the spurious randomness one must 
define a notion of "equivalent" measurements that yield 
the same knowledge about the quantum system and then 
search among this class of equivalent measurements for 
those which minimize the number of bits / of classical 
data produced by the measurement. It is important to 
include in the equivalence classes not only measurements 
on single states, but also measurements that act collec- 
tively on blocks of independent states. It is also essential 
to include in the equivalence class measurements that 
differ infinitcsimally. Such extensions are natural in the 
context of information theory. We shall refer to the above 
procedure as the "compression of quantum measurement 
operations" . 

The results of || are incomplete in several ways and 
we complete them in the present paper. In particular we 
give a more precise description of what constitute "equiv- 
alent" measurements. We then obtain lower bounds on 
the amount of classical data / that can be produced by 
equivalent measurements. Finally we construct measure- 
ments that attain the lower bound. Both results apply 
to general POVMs and in particular to incomplete mea- 
surements (for which the POVM elements are not all pro- 
portional to one dimensional projectors). 

II. PREVIOUS RESULTS 

In this section we shall recall the results obtained in || . 
This will serve as a basis for the presentation of our new 
results in the next section. 

Consider a quantum ensemble consisting of states 
(in the Hilbert space 7i which we assume to be of fi- 



nite dimension d throughout the paper), with proba- 
bilities pi (i = 1, ... ,71), and a measurement POVM 
a = ( a j)j=i,— ,m- We suppose that the measurement 
maximizes a fidelity 

i 3 

where F+j is the contribution (or gain) in the case that 
on being given state \ipi) the POVM hits upon guess j 
(which happens with probability (4>i\a-j {ipi})- Note that 
this is equivalent to the objective of quantum estimation 
theory jl],D to minimize the cost. The same minimiza- 
tion problem occurs in the computation of the so-called 
quantum rate distortion function, as defined in Q] . 

Here the reason for introducing a fidelity is that it al- 
lows us to define in an implicit way a class of equivalent 
measurements. Indeed the Fij encode implicitly a prop- 
erty about which knowledge can be obtained by a mea- 
surement. And a measurement that maximizes F is an 
optimal measurement for this property. One then defines 
as equivalent all the measurements that maximize F. 

It is demonstrated by examples in |j| that the number 
of outcomes / of the optimal measurement can exceed the 
von Neumann entropy of the ensemble. But it is proved 
that if a large number of independent states are avail- 
able, then one can find an almost optimal measurement 
that acts collectively on all the copies with logarithm of 
number of outcomes asymptotically bounded by the von 
Neumann entropy of the ensemble. This result can be 
formulated more precisely as follows: 

Introduce the density operators pi = \ipi)(ipi\, and the 
average state p = J2iPiPi- We assume in the sequel that 
p > on 7i (otherwise pass to the support of p) and 
that Ti. is finite dimensional. Suppose that a number 
I of independent states are available. The I states are 
given by the density operator p$ = pi x ® ■ ■ ■ ® p^ , with 
probability p# — pi x ■ ■ ■ pi r Here and in what follows i 
is an abbreviation for a tuple . . . , ii). 

The fidelity for the / independent states is defined as 
the sum of the individual fidelities 

1 - 

— y ^ t F k j k ■ (1) 
fc=l 

This is a crucial aspect of the model: the fidelity on 
blocks is constructed from a sum of fidelities on the indi- 
vidual systems, in fact as the average of these fidelities. 

We now consider a POVM A on TL® 1 and we compute 
the fidelity for this POVM. This POVM has M outcomes 
labeled by p — 1, . . . , M. In order to compute the block 
fidelity (jl]) we must associate to each POVM outcome 
p a tuple of guesses j' . Hence the individual POVM 
elements will be denoted Aai . 

One possible example is the product POVM a®' which 
consists of all operators a,ji = a^ x ® • • • ® a,j, . One easily 
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checks that in this case the fidelity on blocks 
F(a® 1 ) = J2Pi> E ^'K' WM'j- 



equals the single letter fidelity F(a). In this case the 
number of outcomes is equal to the maximum possible 
number of guesses, m . However in general the number 
of guesses M may be smaller than the number of possible 
tuples. Thus there can be some tuples that are never 
associated with a POVM element, and hence can never 
constitute a guess. However even when M is less than 
the number of possible guesses, we can still compute the 
average fidelity. 

What we are after is a POVM A = (Aji )/u=i,--. ,M 
on H.® 1 whose fidelity F(A) is close to the optimal fi- 
delity F op t and with a minimal number M of outcomes. 
This will constitute a POVM belonging to the equiva- 
lence class for which all the spurious redundancies have 
been eliminated. The central result of |3) is the construc- 
tion of such a POVM: 

Theorem 1 (Massar, Popescu |J) For e > and I 

large enough there exists a POVM A with fidelity F (A) > 
F opt - e and 

M < exp(l(H(p) + e)) 

many outcomes, where H(p) = — Trplogp is the von 
Neumann entropy. □ 

We can rewrite the fidelity of A as 



1 1 

F(A) = E Pi , E Tr (p it Aj, )jJ2 F ' 



k=l 



(2) 



I 

fe=i i j 

where (with [I] ={!,... ,1}) 




(3) 



To prove the second and third equality recall the defining 
property of the partial trace on the composite system 
Hi <g> H 2 : 

VA Tr (ATi 2 C) = Tr ((A <E> t)G) . 



Note that for all k the A^ (j = 1,... ,m) form a 
POVM which we shall refer to as marginals of A. The 
marginals of A describe the action of the POVM A re- 
stricted to the fc'th state in the block. They will play a 
central role in what follows. 



III. MODEL AND MAIN RESULTS 

Theorem |l| is incomplete in several ways: Why do the 
ensemble states not enter, only their average? Is it im- 
portant that they are pure? Also, what is the deeper 
reason that the fidelity matrix does not enter, nor the 
structure of the optimal measurement? Is the bound on 
M optimal, or better: under which conditions is it opti- 
mal? The results below will help clarify these questions. 

We start by analyzing the fidelity constraint on the 
interesting POVMs: this will lead to a series of conditions 
(C0-C3) of increasing strength. Theorem [l] lets us start 
out from the condition 



\F(A) - F(a)| < e. 



(CO) 



Looking again at (Q) we observe that F(A) is an average 
over the I positions of equally structured quantities: each 
is an average of the fy, with probabilities PiTi {piAj ). 
Thus, assuming that the are (without loss of gen- 
erality) bounded by 1, a POVM A on H® 1 will obtain a 
fidelity within e of f (a) (for any measurement a, not only 
the optimal POVM a on Tt) if, for all k, the distribution 



, k=l 



PiTr(piA^) 



is close to (piTr (piQj))ij , i.e. 



V/e 



E 



\ fc=i 



PiTr(piaj) 



<c. (CI) 



This will be satisfied if for each position k and each i the 
corresponding sub-terms are close: 



VfcVi 



\ ^ 



)-Tr( Piaj )\ <€. (C2) 



And this in turn is satisfied if 



VA- 



Ei 

3 



A 



(fc) 



< e. 



(C3) 



Here the operator sup norm is used. Proof is by the 
Holder inequality for the trace pairing of operators: 

^(as)! <HAHx.pl!. 

Now given any ensemble with average state p and a 
POVM a = (cij)j=i,... , m a canonical ensemble for p can 
be written down: the states 



1 



P.i 



Tr (pdj 
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with probabilities Xj = Tr (paj ) . 

Note that this ensemble has the property that its 
"square root" (Holevo ||) or "pretty good" (Hausladen, 
Wootters M) measurement is exactly a: 



Theorem 2 With the above notation and e > 0, there 
exists a POVM A = (Aji )^ = x,.., ,m with 



numbers, with probabilities Pi(V , i| a j|V , i))i e.g. ones which 
depend in some nonlinear way on the POVM used. Theo- 
rem |^ gives an affirmative answer for all fidelity measures 
which depend continuously on the POVM (to be precise, 
on its marginals: the definition of the fidelity on blocks 
as average of the single block fidelities seems to remain 
esse ntial): an example for this will be discussed in sec- 



tion 



VII B below. 



IV. LOWER BOUND 



M < exp / H{p) - X j h (Pj) + CVl 



(where C is a constant depending only on e, d and m), 
and such that 



Vfc £p 



( fc ) n 



The proof of theorem |3J rests on some standard facts 
about von Neumann entropy: 

Lemma 4 Let o~j be quantum states on Tl, Xj probabili- 
ties, and a = JV XjUj. Then 

H(a)<H(X) + J2^(a j ). 



The characteristic constant in the exponent, 



I{X;p) = H(p)-Y d ^H{pjl 



is called entropy deject of the ensemble (Lebedev and 
Levitin JtJ ) , or the quantum mutual information between 
a sender producing letter j with probability Xj and a 
receiver getting the letter state pj (see ||). It is the 
difference between the entropy H(p) of the ensemble and 
its conditional entropy H(p\X) — JV XjH(pj). 

The theorem is in an asymptotic sense best possible: 

Theorem 3 Let < e < (A /2) 2 , with A = mm. A, . 
Then for any POVM A = (Aji ) (ti= i i ... ,m such that 



E 



.4 



(fe) 



< e, 



M > exp 



H(P) 



j 



X 3 H{f>3) 



3e 



2e 



Xq Xnd 



These theorems are proven in the following two sec- 
tions. They provide answers to the questions at the be- 
ginning of this section. By demanding a bit more, namely 
condition C3 instead of the weaker CO we find the opti- 
mal rate of compression for any POVM. This improves 
the previous result (theorem [j]) in all cases where the 
aj are not all of rank 1. This optimal compression is 
independent of fidelities, as well as independent of the 
ensemble structure, except for the average state p. 

These theorems also answer a question from 
whether the result of that paper still holds for fidelity 
measures which cannot be reduced to the form described 
in the introduction (i.e. an average over certain fixed 



Proof. See : this is just the monotonicity of the mutual 
information (data processing inequality) under the com- 
pletely positive and trace preserving map j i— > o-j from 
the commutative algebra generated by the j as mutually 
orthogonal idempotents to the algebra of linear operators 
on H. □ 

Lemma 5 Let o~i,... ,o~ r be states on Tii ® H.2, with 
probabilities s±,... ,s r , such that JV SiOi is a product 
state. Then 

I(s;a) >/(s;Tr 2( j) + /( S ;Tr 1 CT). 

Proof. This is essentially the sub-additivity of entropy 
(see [@, p. 23). □ 

Lemma 6 Let a\,... ,a r be states on H., with probabili- 
ties sx, . . . , s r , and (J%, . . . , Jt) a partition o/{l, . . . , r}. 
Then, denoting 



Sj = Si and aj = — 



ieJj 



it follows that 



I(s;a)>L{s;a). 



Proof. See |9(] : it is another special case of monotonicity, 
known as coarse graining. For a direct proof observe that 



and by the concavity of von Neumann entropy 



ieij 
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Lemma 7 Let p, a be states on Ti, d — dim7i 7 and 
\\p-a\\i <a< 1/2. Then 

\H(p)-H(a)\<-a\og^. 

Proof. See go), p. 22. □ 

Now we are ready for 
Proof of theorem [|. On H® [ consider any POVM 
A = (Aji ) Al= i l ... t M which satisfies the hypothesis of the 
theorem. Then, denoting A M = Tr (p® l Aji ) and 



A, 



we find 



logM > H(A) > H(p® 1 ) -^ApHft) 
i 

= I(A;p l ) >J2 T ( A(k) ->P (k) )> 
fe=i 

using lemmas |], and ^, with the marginal distributions 
given by 

Af=Tr(p4 fc) ) 
and the marginal channel states 

*(*) - 1 



Pi 



A 



(*•) 



By the hypothesis we have 

l|A (fc) -A||i <e, 
and consequently for every k and j 



ft 111 < T2 € - 
A 



Thus we can estimate for every k: 

7(AW;^)) = ff(p)-X:Af J ff(pf) 

> H(j>) -Y.ljHfa) 



2e j 2e 
Age 

where we have used lemma [?], and we are done. □ 



elogd+^log- 



V. THRIFTY MEASUREMENTS 

We will prove theorem ^| in several steps (proposi- 
tions ||, (To), |ll], and [TJ] below). The strategy is as fol- 
lows: we construct a series of sub-POVMs^] B, C, D, 



A sub-POVM is a POVM except for the weaker condition 
that the sum of its elements is only upper bounded by 1. 



and E, each in turn satisfying the condition C3 (which 
we demonstrate for didactical reasons even though this is 
not necessary for the ultimate proof), and of increasing 
regularity. The last step to construct A = (A,-i )p.=x,... ,m 
is a random selection argument with a novel large devia- 
tion probability estimate. 

To do this we have first to review the concepts of typ- 
ical subspace and conditional typical subspace, in the 
form of |11| : 

For a state p fix eigenstates ex,.. . , and define for 
S > the typical projector as 



E 



t> with IEL=i et h -l P \<5sfiy/p{l-p) 



For a collection of states pj, j = 1, . . . , m, and j l e [ni] 1 
define the conditional typical projector as 

(gK,. 

where Ij = {k : jk — j} and IT^. $ is meant to denote the 
typical projector of the state pj in the positions given by 
Ij in the tensor product of I factors. From jll| we cite 
the following properties of these projectors: 



(4) 
(5) 
(6) 

(7) 



Trn^ < exp (lH(p) + KdSVl) , 
Trn^ >(l--^j exp (lH{p) - KdSVl 
Trl& gU 1 ) < exp (lH(p\P jl ) + KmdSVl 



Trn^(/)> (1- 



s 2 



exp (lH(p\Pji) + KmddVl 



for an absolute constant K > 0, and the empirical distri- 
bution Pji of letters j in the word j 1 : 



Pf(j) = 
Also from (ITT 



) # of occurences of j in j 



I 



Tr(^TI^)>l 



S 2> 

md 



Tr(^n^(/))>1- s . 



(8) 
(9) 



with r denoting the minimal eigenvalue of p. 

To end this review observe the following important op- 
erator estimates: 



n' s > n 



[i\\k 



1, 



V,<5 — "-p,5> 

n^(/)> ng\*(i raXfc ) ® i, 



(10) 

(ii) 



where <5' = <5— ljr > 5/2, if we assume 6 > 2/r. Inequal- 
ities (10) and ( pd| ) will be used in conjunction with the 
following lemma: 
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Lemma 8 Let C be a positive operator on 7ii ® Hi, II 
a projector on TL\ ® 1~Li, and Hq a projector on Tii such 
that n > t®U . Then 

Tr 2 (ncn) > Tr 2 ((i®no)C(i®n )). 

Proof. Because of 11(1 ® IIo) = 1 <8> IIo we may assume 
that C = IICII. Thus we have to prove that 

Tr 2 C> Tr 2 ((l®no)C(l®n )). 

But this is equivalent to 

VA > Tr ((A <g> 1)C) > Tr (4 ® Tl C) , 

which in turn is equivalent to A (g> 1 > ^4 (gi IIo , and this 
is obvious. □ 
Defi ne t he following operators (with p and pj as in 
section III): 



Intuitively this means to confine the a^i to the range of 
the conditional typical projector 11^ s (j l ). 

Proposition 9 1. < Bji < aji. 
2. Tr {p^Bji) > (l - Tr (p^'oji). 



5- y/pa jy /p- A] < y/pBf>y/p < y/pajy/p, with A) > 



and Tr A] < A^ 
>(*0 



vfc v ■ HVpsfyp- Vp%-VpIIi - 



md 



5. vfcE.-ll^f-^n^^i. 



Ptoo/. 1. is equivalent to </pP l B jt y f(f )l < y^'ay ^\ 
which is immediate from the definition. 

2. is essentially equation [| 

3. follows from 1. and 2. 

Finally, 4. and 5. are easy consequences of 3. □ 
Defining the operators 



i.e. restricting the Bji to the range of the typical projec- 
tor H L p s , we find 

Proposition 10 1. Tr (p® l Cji ) < Tr {p® l Bji). 

2. ^paj^p-A? < JpCfjp < ^pa^+A 2 , with 
A 2 > 0, A 2 = J2j A|, and Tr A 2 < X 3 ^^. 

3- VfcE, llVpCf Vp- yfpBf < 



VfcEjII^-Sf^ll < 



(fe)l 



rS 2 ' 



Proof. 1. follows from Tl l p jp® l Il l p . s < p m , and the defi- 
nition. 

To prove 2., we first do the lower bound (the other follows 
from this straightforwardly): 



jk=j 

= ^Jk J E V^V^'K 



^a,Vp Tr (p^V^)* 
yfpaji/p- Al 



where the inequality is with A ? -( > 0, TrA ? i < 



(by proposition and 2.), and by lemma|. Hence the 
subsequent equalities are valid with 



„ . , rnd , . 
Tr A < A,--j5-, with A 



and 



^ = A + Vpa,Vp(l-Tr (/ y 



E a, 



By equation I we conclude Tr A 2 < \j md £ id . 
Finally, 3. and 4. are easy consequences of 2. □ 
Now with the probabilities Xj — Tr (paj) define the set 
of typical sequences 



{J l : Vj - /Ajl < M^/Aia-Ai)}. 



The next simplification is to use only operators of our 
sub-POVM C with typical j l : define the sub-POVM D 
to consist of the Cji for y E T s , i.e. D = (Cji)ji eT i. 

Proposition 11 1. X l (T s l ) =: S > 1 - J|. 
2. TrQP'Dj,) > (l- f^) Tr^a.-O. 



5. JpDfjp = JpC) K >Jp - A), with A] > 0, 
EiTrA? < m/S 2 . 

5. Vfc £. ||i?f - Cf 5 1| 

Proof. 1. follows from Chebyshev's inequality (com- 
pare 0). 

2. is seen as follows: with the eigenstates e t of p define 



Pj = E (Pj) = E et ^ 



G 



with the conditional expectation E. Then it is obvious 
that 

Tr(p jl n l p<s )=Tr(p jl U l pi5 ). 
From the definitions it can be directly verified that, with 

p = f Efc Pj k i 

n p,5 > II- r g^, 



hence by |ll[], lemma V.9 



Tr (pfU 1 ^) > 1 



m d 
r 2 8 2 ' 



and with proposition 2. the claim follows. 
For 3. observe 

VpCfYp~VpDfYp= E Tr^^C,,^, 

and denoting the r.h.s by A?, the claim follows from 1., 
observing that 

Tr ^/p® 1 C ji ^/p^ 1 < Xji, 



by propositions y. 1 . and [10[ 1 . 

Again, 4. and 5. are easy consequences. 



□ 

We shall use the probability distribution AonT/, with 

1 



A 



Observe that 

J 1 ^ 



with 



Introducing 



TrA 4 < (m + l)(d + l) 
<S 2 



1 - J exp (-lH(p) - KdSVl 



(so that 11^ s p® l U l p s > aH l p s ), we can construct the sub- 
space spanned by the eigenvectors of M corresponding to 
eigenvalues at least ca. With its projection II we have 
ITwII > call. This implies 

Tr ( W (n{, i4 - n)) < c, 

hence because of Tru>H l s > 1 — c 

Tru;II>l-2c. (12) 
Now, define the sub-POVM E by 



E, 



Proposition 12 For f e T s l : 

1. Tr (p® l Eji) < Tr (p® l Djt). 



2- vfcE, llV^f Vp- VP D i k) VPh ^ 2 



11 V r J 



3. VfcEj 114 ; - -Drill <2mc/r. 

Proof. 1. is obvious, and 3. follows from 2. 
To prove 2., first calculate 

V^f VP = Tr #fe I £ VP 



<8>J 



with 



= T 1Vfc J] iy^^n 

= : AjTr^fcilwfcjTI, 



Observe that by equation [l^ 

Tr LUkj II > 1 



2c 



But because of 

n l Pt eu k iU l PtS - uuuu = (n l Pt8 - n)w w n{, ia 
+ n^(n^-n), 

we get 

||n^^n^ - UwmUIU < ||(n^ - n)o; W n^||i 

+ ||nw fcl (n| )ia -n)|| 1 
<2Tr (^(n^-n)) 

< 2c/ Xj, 

thus we conclude 

VP E j k) \fp = Tr^II^A^-n^ + A| 

where || A| ||i < 2c. □ 
The proof of the theorem will now be completed by a 
random selection of a sufficient number of elements from 
E. We invoke a result from JT^ ]: 

Lemma 13 Let X±, . . . ,Xm be i.i.d. random variables 
with values in the algebra C(IC) of linear operators on K,, 
which are bounded between and 1. Assume that the 
average MX^ = a > si. Then for every r\ > 

Pr | if E X ^ (1 + < dim/C exp (-M^) . 
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With this we can now finish 
Proof of theorem ||. Starting from the POVM a® 1 con- 
struct the sub-POVMs B, C, D, and E, as above. 

Define i.i.d. random variables J±, . . . , Jm with values 
in Tg such that 

Pr{J M = i ; } = A j! , p=l,...,M. 
These define operator valued random variables 

X p = -^^ l E.,^\ p = l,...,M. 
Observe that for all p 

ex p = n^n < oj < n l PtS p 9l n l PiS < P ®\ 

and that X p > with TrX^ < 1. Furthermore, since 

we have X p < (311 with 

(3 = exp (-lH(p\X) + KmdSVtj ■ 
Most importantly we find 

EX p = UujU > can. 
Apply lemma O to the variables /3~~ 1 X p to find 



< Trnexp -M 



rj 2 ca 
2/3 In 2 



(13) 



\ik] 

Define Y p — Tr^A^ if J p k = j and otherwise. 
Observe that 

so by propositions |.2., 0.2., 02., and [l|.2. 

r.-n _ _„ ra m 2 d + 4md md 

\\mrM-y/pa i Vp\\i<2c+g2+ — j 2 — + ^r= :d - 

Thus, by the operator Chebyshev inequality E3] 



Pr 



^Y^-M^paj^p 



> M~c + Sy^Vl > < 



IS 2 ' 
(14) 



In case that the sum of the right hand sides of the 
probability estimates from equations ([l3]) and (|lj) (j = 
1, . . . , to, k = 1, . . . , 1) is less than 1 — which can be 
forced by choosing 



Hi — ; , 21n2(l-loga)/3 
<5 > V2md and M > ^ ?—Lt- 



rfc 



(15) 



- there are actual values J\ = j[ , . . . , Jm — Jm sucn 
that 

lE^"^^^ < H + V)K,SP & K,S, (16) 



and 



37 H ^m^v^'^v^'J-VwjVp 



< c- 



VaT 

(17) 



In this case we may form the following sub-POVM: 
1 



A 



J " (1 + rj)M 

1 S 



—7®l I S ®l 01 . 

P I —VP % VP VP 



(1 + r?)M Xji Ej '» 



First observe that this is indeed a sub-POVM, as by 
equation 



P ,s- 



We claim that it satisfies condition C3, more precisely, 
by equation (117]) we find 



VfcVj 



J2 T>/fc(vp®'%\/p j-\/p a jVp 



< V + 5 



Distributing the remainder i? = 1 — Aji equally 
over the operators will give us our desired POVM A: 



A.-i = Aai H i?. 



^ Namely, it is immediate that now (inserting equation |jj|) 



Vp4 fc Vp~ y/pajy/P <(m+l)[r, + c + 



and we are done, by choosing r\ = 5 , with 5 suitably 
large, and M according to equation (|lq). □ 
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VI. EXTENSIONS 

Not to encumber the proofs with too many estimates 
which actually would not contribute to the understanding 
of the results, we chose to present theorems || and || in 
their above form. 

However, let us see here how far we can actually get 
with our theorem |2|: we might for example go beyond C3 
by requiring 



EE IK 



(fc) 



< e. 



(C4) 



We might go even further, and demand that A approxi- 
mates a® 1 not only on single factors but also on all sub- 
sets of factors, K C {1, . . . , I}, of moderate growing size, 
say \K\ <vi = o(l): 



E Eii4 

KC[1],\K\< VI j K 



— «,-K < e. 



(C5) 



Here 



1 



E 



a ji 



fi: VkeK j^ k =jk 

is the restriction of A to the tensor factors K and 
a 3 K = ® a ik 

keK 

is an element of the if-factor POVM &® K . 

It turns out that, using slightly stronger estimates for 
the typical subspaces and typical sequences than those 
used in section ^ one can prove 

Theorem 14 With the above notation, there exists a 
POVM A = (Aji ) fJl= x,..., M with 



M < exp J I J H(p) - ^2 ^jH(f>j 



o(l) 



and satisfying condition C5. 



Proof. From [|l3| , lemma 1.9, we use the following esti- 
mates: for fixed pj and p there exists a constant 7 > 
such that 

Tr (p m Il l p , s ) >l-d-e-^ 5 \ (18) 
Tr (fcill^tf')) > l^rnd-e-^ 2 . (19) 



Instead of equations (|8j) and (|9|), use equations ( |18| ) and 
( |l9| ) in all steps of the proof in section [v| Then, choosing 
5 = Si such that 



= o(6?), 5i = o{Vl), 



□ 



and with r\ = cxp(— 5 2 ), the theorem follows. 

As an immediate corollary we get an improvement of 
theorem |f 



Theorem 15 For e > and I large enough there exists 
a POVM A satisfying 



and with 



l-F(A) > l-F opt — e 
M <exp(l(H(p) + e)) 



many outcomes. □ 

On the other hand, inspection of the proof of theorem || 
shows that it remains valid (up to another 0(el) in the 
exponent) under the slightly weaker condition 



7 EE IK 



(k) 
3 



Oj|| < e. 



(C2i) 



fc=i j 



VII. DISCUSSION 

In this article we have shown how to compress quan- 
tum measurements. More precisely we have shown how 
to devise a measurement A that is close to a certain 
given POVM a but that produces a minimum amount 
of data. This minimum amount of data is equal to 
/(A; p) = H(p) - £ . XjH(pj), where 

Pj = T-VP a 3VP' = Trpoj. 

This result provides a precise measure of how much 
knowledge about the unknown states is provided by 
the measurement a. Namely the amount of knowledge 
provided by the measurement is equal to the minimal 
amount of classical data produced measurements A that 
are close to a. This is because the measurement A resem- 
bles the measurement a, hence provides as much knowl- 
edge about the states as a. But the spurious randomness 
in data produced by the measurement a has been re- 
moved. Thus we deduce that the amount of meaningful 
data produced by the measurement a is I (A; p). 

We now consider several questions and lines of inquiry 
which are suggested by the present results. 



A. Information missed by incomplete measurements 

Consider a POVM that is not maximally refined. By 
this we mean that the POVM elements aj are not all pro- 
portional to one dimensional projectors. Such a POVM 
does not provide maximum knowledge about a quantum 
state. However at a later stage one can refine the POVM 
so as to obtain additional knowledge about the state. 
We would like to know whether carrying out such a se- 
quence of measurements provides the maximum knowl- 
edge about the state, or whether their is an irreversible 
loss of knowledge in such a two step measurement. We 
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shall argue, using the results presented in this paper, that 
if the first measurement is carried out in such a way to 
minimize disturbance to the state, then no knowledge is 
lost by such a two step procedure. 

When the POVM is not maximally refined the amount 
of meaningful data produced by the measurement is less 
than if the measurement is maximally refined since the 
term which one subtracts . XjH(pj) in J(A; p) vanishes 
in one case and not in the other. Suppose that after the 
first measurement one carries out a second measurement 
b which is maximally refined. Let us note that after the 
first measurement, if the state was pt and the outcome 
was j, one obtains the state o^i given by the completely 
positive map 



Tr pi a j 



(20) 



where J2» V jv V jv 
outcome was j is 



And the average state if the 



Tr pa,j 



(21) 



Since the second measurement is maximally refined, the 
amount of meaningful data it produces is equal to H{aj). 

We first consider the case when the completely positive 
map has only one term in its Kraus representation. In 



this case er, 



U, 



fOjP 



ajU] 



(where Uj is a unitary 



matrix). We now show that the amount of meaningful 
data produced by the second measurement H(<7j) is equal 
to the amount of data missing from the first measurement 
H(pj). This follows from the fact that Oj and pj have 
the same spectrum, as they are conjugates. To show this 
it is sufficient to show that ^fpaj^fp and ^Ja^p^fa] are 



'y/p, we have 



conjugates. Using the notation B 

B ] B = ^fpa^^fp , 
BB^ = ^/ajpy/aj . 

Introducing the polar decomposition B = U\B\ (where 
U is unitary and \B\ = V B^B), we find 

BE* = U\B\ 2 U^ = U(B^B)U^ 

which is what we needed to show. 

Thus in the case where the Kraus representation of the 
completely positive map (^o| ) contains only one term, the 
deficit in the amount of meaningful data produced by 
the first measurement is exactly equal to the amount of 
meaningful data obtained by the second (maximally re- 
fined) measurement. It thus appears that making an in- 
complete measurement which is such that the Kraus rep- 
resentation of the measurement operation contains only 
one term for each POVM element does not give rise to an 
irreversible loss of knowledge. Rather the knowledge is 
still present and can be accessed by a second more refined 
measurement. 



The case when the Kraus representation of the com- 
pletely positive map (^oj ) contains only one term corre- 
sponds to the situation in which one disturbs as little as 
possible the quantum state. On the other hand when the 
Kraus representation contains more than one term, one 
easily checks on examples that the amount of information 
obtained by the second measurement bears no relation to 
the amount of information obtained by the first measure- 
ment. This is because the map can either add noise to 
the state or take away information. 

The above discussion raises an interesting question 
concerning the amount of information transferred to a 
state or taken away from the state by a completely pos- 
itive map. The approach developed in this paper may 
illuminate this question and we hope to report on this in 
a future paper. 



B. Relation to Holevo's bound 

Consider an ensemble of states {uj, pi} whose average 
is '^2 i pLiUi = p and consider a POVM with elements a,j. 
We define random variables X, Y with joint distribution 



Pi{X = i,Y = j} = HiTr((na j ) 



(22) 



They describe the joint probability that state Ui occurred 
in the ensemble and measurement outcome j occurred. 

Holevo's bound Q] states that the mutual entropy be- 
tween a source with ensemble {oi, pt} and a measurement 
dj is bounded by the entropy defect of the ensemble: 



I(X A Y) < I(p- a) = H(p) - ^H{cxi 



(23) 



Note that Holevo's bound is a function only of the 
ensemble {(Ji,Pi}, and the measurement plays no role 
in the bound. On the other hand in the present paper 
the ensemble plays a secondary role, and we have consid- 
ered how the joint distribution of X, Y changes when one 
changes the measurement. In order to make connection 
with Holevo's bound we shall use a trick that allows us 
to switch the role of ensemble and measurement. 

Let us denote the triple consisting of the states <Ji, the 
probabilities pi and the POVM elements aj by 



M, 



We now construct a second triple 



V, 



P,A,S 



{pj,Xj,Si} 



canonically associated with the first. In this second triple 
the states are 



Pj 



1 

A 



-VP~aji/p 



and their probabilities are Xj — Trpaj. The POVM ele- 
ments Si of the second triple are the "pretty good" mea- 
surement of the ensemble {ai,pi}: 



5,. = 
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We call these two triples canonically associated for 
two reasons. First the average of the states is the same 
J2i^i a i = J2j^jPj = P- Second the probability that 
state Oi occurred and measurement outcome j occurred 
the first triple is equal to the probability that state pj oc- 
curred and the measurement outcome i occurred in the 
second triple: 

Pr{X = i,Y = j} = Mi Tr (a^) - AjTr (Sifa). (24) 

Using the relation between these two triples and in 
particular eq. ( pi] ) we can write two forms of Holevo's 
bound. The first is equation (E3), the second is 



(25) 



I(X AY) < I(p; A) = H(p) - ^H{Pi) 



Thus for a given triple, say M a „ a , we can derive two 
bounds on the mutual information, the first ((2^) depends 
only the ensemble {o~i,pi}, the second (25) depends only 
on the average state p and the POVM a. 

In order to establish a connections between the present 
work and Holevo's bound, we use the second form of 
Holevo's bound, equation (|25|). Let us first note that 
theorem ^ shows that one can always devise a measure- 
ment A acting collectively on many independent states 
whose marginals are close to the POVM a and with a 
number of outcomes equal to the right hand side of (24). 
Thus Holevo's bound and theorem are consistent. 

However we can go further and use p4[ ) together with 
theorem || to derive Holevo's bound. Let {pj, \j} be 
any ensemble of states with average p — J2j Aj Pj > an< ^ 
(Si) a POVM. With random variables X, Y as defined in 
the second equality in (^2|), Holevo's bound is equivalent 
to (^5|). Let us denote the classical mutual information 
I(X A Y) by I({pj, Xj} A (Si)). Now we revert the ar- 
gument from the beginning of this subsection and invent 
the POVM a and the ensemble {ai,pi}, so that the first 
equality in (|24|) is satisfied. In particular we get 

I({a t , fa} A (a,)) = I({pj, A,} A (Si)), 

and what we have to prove transforms into 

I({ai,^}A( aj ))<I(X;p). 

Here our theorem || comes in: define, for any POVM 
a, the fidelity function 

F(a) = I({ai,(ii} A (aj)), 

and for the POVM A on H® 1 the fidelity on blocks 

F(A) = j'£l({a i ,t H }A(A? ) )). 



k=l 



Observe that this is a nonlinear continuous function of 
the POVM (the ensemble {(Ji,pi} we now consider as 
fixed). 



By theorem g we find, for e > and large enough I: 
I(X;p) + e> jlogM 

> jltfo.^'A^,) 
= jI(X l A Y) 

[Pr{V' = i l , Y = p} = p l iTi(a l iA f J 
1 - 

fc=l 

1 1 r - 

> y^7(X fe A/ fe (Y)) [fk(p)=^k 

k=l 

= y£ J (fo>MA(4f )) 
1 k=l 

= F(A) 

> F(a) - e 

= Iiia^pi} A (dj)) - e. 



(Only classical information inequalities have been used: 
the second line is by data processing, the fourth from in- 
dependence of the Xk, the fifth by data processing again). 
Because e > was arbitrary, we are done. 



C. Data vs. information 

The above discussion concerning the Holevo bound can 
be used to address the relation between data and (mu- 
tual) information. 

Holevo's bound as usually presented is a function only 
of the ensemble {ai,p{\ of states emitted by a source. 
Maximizing over the measurement, with fixed ensem- 
ble, yields the accessible information at fixed ensemble 
I&cc(p] c). It was shown in Jl4| that the accessible infor- 
mation attains the Holevo bound if and only if all the 
states that compose the ensemble commute. Further- 
more this difference remains even asymptotically when 
one considers measurements on many independent states 
emitted by the source because (see |15|) 



l ) = l- hc C (p;o-). 



On the other hand it is known that one can carry out 
block coding and construct an ensemble whose marginals 
are such that they are distributed in the same way as the 
original ensemble, such that for this ensemble the ac- 
cessible information approaches with the Holevo bound: 
see iH@. 

Let us now transcribe these results in terms of measure- 
ments, using the second form of Holevo's bound discussed 
above. If one keeps the measurement a fixed and maxi- 
mizes over the ensemble (with the average state p fixed), 
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one reaches the accessible information at fixed measure- 
ment and fixed average state, which we denote J p (sl). It 
follows from the above discussion that J p (a) is strictly 
less than J(A; p) except if all the pj commute, and that 
this gap remains even asymptotically since 

J p ®i(a® l )=l- J, (a). 

Thus the mutual information at fixed measurement and 
fixed average state is in general strictly less than the 
amount of meaningful data produced by the measure- 
ment. 

However, it follows from the results of ||,[ll||ll| that 
there exists a measurement A acting on the tensor prod- 
uct Tt® 1 of the Hilbert space of the composite ensemble, 
such that its marginals are very close to a, and such 
that the accessible information J p m (A) equals I ■ /(A; p) 
asymptotically. 

We conjecture that the POVM A constructed in the- 
orem |^ has all the properties of A enumerated above. 
This would mean that the compressed version A of the 
POVM a® z asymptotically closes the gap between mutual 
information and amount of data. 



D. Open questions 

There remain a number of open questions for future 
research of which we point out a few. The first three 
concern a better understanding of the conditions under 
which we get our result: 

1. In the case where the ensemble on which the mea- 
surement is carried out is composed of mixed states, 
can one decrease further the amount of data pro- 
duced by the measurement? The results proven in 
this paper use condition C3 in which only the aver- 
age density matrix p of the states enters (through 
the definition of the marginal POVMs). However 
it is possible, if one uses the weaker conditions CO, 
CI, or C2 that the measurements can be further 
compressed. 

2. Conversely, one could prove that further compres- 
sion is impossible (theorem |^) using conditions CO, 
CI or C2. 

3. In the case of rank-one POVM the entropy defect 
in theorems |^ and ^ becomes the entropy of p, the 
number of outcomes of the compressed measure- 
ment is comparable to the dimension of the typical 
subspace of p® 1 . Since the interesting part of the 
construction is in the typical subspace we may ask 
whether one can achieve the bound of theorem |^ 
(or a slightly weaker one) by a von Neumann mea- 
surement. The methods used in the present paper 
and in [^) do not seem to yield this. 



A final question concerns the tradeoff between fidelity 
and number of outcomes of A. Here we studied only 
the extremal case where the fidelity should be arbitrarily 
close to the maximum, but comparison with rate dis- 
tortion theory (see for example |l7]]) makes it plausible 
that by allowing a certain loss we can save even more in 
the output entropy. This is because on blocks the fidelity 
obeys the same form of rule as the typical distortion mea- 
sures: it is the average over the block. 

Several distortion criteria could be used, e.g. 

F(A) > F(a) - d, 

but many others seem natural, too. 

A similar tradeoff may occur between the optimum 
compression rate and the parameter g, Ui = \_gl\ in con- 
dition C5 (here we have treated only the case g = 0+). 

We intend to pursue these questions in future work. 
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